Skip to content

Optimize /ui/dags endpoint serialization#61483

Open
john-rodriguez-mgni wants to merge 1 commit intoapache:mainfrom
john-rodriguez-mgni:fix/dags-endpoint-performance
Open

Optimize /ui/dags endpoint serialization#61483
john-rodriguez-mgni wants to merge 1 commit intoapache:mainfrom
john-rodriguez-mgni:fix/dags-endpoint-performance

Conversation

@john-rodriguez-mgni
Copy link

Summary

This PR addresses a significant performance issue in the /ui/dags endpoint where page load times scaled poorly with the number of DAGs (12-16 seconds for just 25 DAGs in our testing).

Two optimizations are implemented:

1. Cache URLSafeSerializer for file_token generation

Previously, a new URLSafeSerializer was instantiated and conf.get_mandatory_value() was called for every DAG in the response. Now uses @lru_cache to create the serializer once and reuse it.

2. Eliminate redundant Pydantic validation in response construction

The original pattern used:

DAGWithLatestDagRunsResponse.model_validate({
    **DAGResponse.model_validate(dag).model_dump(),
    ...
})

This caused triple serialization overhead per DAG (validate → dump → validate). The fix validates once with DAGResponse.model_validate(), then uses model_construct() to build DAGWithLatestDagRunsResponse without redundant validation.

Results

Together, these changes reduced page load time from 12-16 seconds to ~130ms in our dev environment.

Test Plan

  • Existing unit tests pass
  • Manual testing of /ui/dags endpoint with varying numbers of DAGs
  • Verify file_token generation still works correctly

Made with Cursor

@boring-cyborg
Copy link

boring-cyborg bot commented Feb 5, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Feb 5, 2026
@john-rodriguez-mgni john-rodriguez-mgni force-pushed the fix/dags-endpoint-performance branch from b8ac2cf to 1350bfa Compare February 5, 2026 16:25
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for the PR.

A few questions/suggestions, but overall looking good.

@pierrejeambrun pierrejeambrun added this to the Airflow 3.1.8 milestone Feb 9, 2026
@pierrejeambrun pierrejeambrun changed the title perf(api): optimize /ui/dags endpoint serialization Optimize /ui/dags endpoint serialization Feb 9, 2026
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thank for the PR.

A couple of suggestion / improvements before we can merge I think.

@john-rodriguez-mgni john-rodriguez-mgni force-pushed the fix/dags-endpoint-performance branch from 1350bfa to f1a3a5c Compare February 10, 2026 04:55
Comment on lines 193 to 192
DagRun.duration.expression, # type: ignore[attr-defined]
DagRun.duration,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, #58352 (cc @KoviAnusha)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we have to keep DagRun.duration.expression, # type: ignore[attr-defined] for mypy.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we can merge after we addressed above comments.

@pierrejeambrun pierrejeambrun added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Feb 11, 2026
@john-rodriguez-mgni john-rodriguez-mgni force-pushed the fix/dags-endpoint-performance branch from f1a3a5c to 153d4c9 Compare February 11, 2026 21:22
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the improvement.

Comment on lines 193 to 192
DagRun.duration.expression, # type: ignore[attr-defined]
DagRun.duration,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we have to keep DagRun.duration.expression, # type: ignore[attr-defined] for mypy.

@john-rodriguez-mgni
Copy link
Author

Just an update on this... since opening this PR, I have included this patch in our dev and QA environments and validated there are no issues and that it resolves the load time for the ui/dags endpoint seen with the base airflow 3.1.7 release sans patches. Earlier today I released it to our production environment and are not seeing any issues either and can confirm that load time for the ui/dags endpoint is fixed.

Let me know what else is required to move this along. Again, really appreciate your feedback and thank you for your help throughout this process!

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what else is required to move this along. Again, really appreciate your feedback and thank you for your help throughout this process!

Can you just fix the CI please (mypy static check error). There are two discussion above that explains what the problem is.

Basically we need to keep the # type: ignore[attr-defined] and revert the change to the get_latest_run_info which is creating mypy errors and isn't really needed for that PR. (that endpoint isn't the performance bottleneck)

@john-rodriguez-mgni
Copy link
Author

Let me know what else is required to move this along. Again, really appreciate your feedback and thank you for your help throughout this process!

Can you just fix the CI please (mypy static check error). There are two discussion above that explains what the problem is.

Basically we need to keep the # type: ignore[attr-defined] and revert the change to the get_latest_run_info which is creating mypy errors and isn't really needed for that PR. (that endpoint isn't the performance bottleneck)

Made updates as requested!

@john-rodriguez-mgni
Copy link
Author

@pierrejeambrun just checking on this to see if there was anything else required on my end, I have made the changes necessary to address the CI failures... I think we just need someone to approve the workflows.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, CI is running it should be better 🤞

@pierrejeambrun pierrejeambrun force-pushed the fix/dags-endpoint-performance branch from 86dd884 to 460629c Compare February 23, 2026 16:23
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI need fixing some tests are failing, can you take a look please.

@john-rodriguez-mgni
Copy link
Author

@pierrejeambrun it seems that all 10 CI failures failures trace back to the same root cause; the allowed_run_types field was added to DAGResponse on themain branch after this branch was created. Since the optimization switches from DAGResponse.model_validate(dag) (which dynamically picks up all model fields from the ORM object) to an explicit dict construction, the new field is missing from the dict, causing a ValidationError: allowed_run_types - Field required on DAGWithLatestDagRunsResponse. The 7 API/Serialization suite failures hit this directly, and the 3 UI e2e failures (Chromium/Firefox/WebKit) are a downstream consequence — the /ui/dags endpoint returns 500, so the dags-list e2e tests can't render anything.

The immediate fix is simple — add "allowed_run_types": dag.allowed_run_types to the dict. However, this highlights a fragility in the explicit-field approach: any new field added to DAGResponse requires a corresponding update here, which is easy to miss. To make this more robust, we could build the dict dynamically from DAGResponse.model_fields which iterates over the model's field definitions to pull attributes from the ORM object, so new fields are picked up automatically. The dict construction is still just cheap getattr calls, and there's still only one model_validate call. Overhead of iterating model_fields is negligible compared to Pydantic validation.

Thoughts?

@john-rodriguez-mgni
Copy link
Author

@pierrejeambrun any thoughts on the proposed implementation: #61483 (comment)

This PR addresses a significant performance issue in the /ui/dags endpoint
where page load times scaled poorly with the number of DAGs (12-16 seconds
for just 25 DAGs in our testing).

Two optimizations are implemented:

1. Cache URLSafeSerializer for file_token generation
   - Previously, a new URLSafeSerializer was instantiated and
     conf.get_mandatory_value() was called for every DAG
   - Now uses @lru_cache to create the serializer once and reuse it

2. Eliminate redundant Pydantic validation in response construction
   - The original pattern used model_validate -> model_dump -> model_validate
     which caused triple serialization overhead per DAG
   - Now validates once with DAGResponse.model_validate(), then uses
     model_construct() to build DAGWithLatestDagRunsResponse

Together, these changes reduced page load time from 12-16 seconds to
~130ms in our dev environment.

Co-authored-by: Cursor <cursoragent@cursor.com>
@john-rodriguez-mgni john-rodriguez-mgni force-pushed the fix/dags-endpoint-performance branch from 460629c to fa124e2 Compare February 27, 2026 06:50
@john-rodriguez-mgni
Copy link
Author

@pierrejeambrun I have decided to build the dict dynamically from DAGResponse.model_fields to make this robust. I ran the CI test locally and confirmed that fixes the issue. I also deployed this locally to our Airflow dev instances and verified that the dags endpoint performance gain is still present. We are ready to approve the CI workflows again. Please approve them at your earliest convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants